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Abstract 

The push algorithm was proposed first by Jeh and Widom [6] in the 
context of personahzed PageRank computations (albeit the name "push 
algorithm" was actually used by Andersen, Chung and Lang in a sub- 
sequent paper [1]). In this note we describe the algorithm at a level of 
generality that make the computation of the spectral ranking of any non- 
negative matrix possible. Actually, the main contribution of this note is 
that the description is very simple (almost trivial), and it requires only a 
few elementary linear-algebra computations. Along the way, we give new 
precise ways of estimating the convergence of the algorithm, and describe 
some of the contribution of the existing literature, which again turn out 
to be immediate when recast in our framework. 

1 Introduction 

Let M he a, n X n iionnegative real matrix with entries m^y. Without loss of 
generality, we assume that ||Af||i = 1; this means that M is substochastic (i.e., 
its row sums are at most one) and that at least one row has sum one} 

Equivalently, we can think of the arc-weighted graph G underlying AI . The 
graph has n nodes, and an arc x ^ y weighted by rrixy if mxy > 0. We will 
frequently switch between the matrix and the graph view, as linear matters are 
better discussed in terms of A/, but the algorithms we are interested in are more 
easily discussed through G. 

As a guiding example, given a a directed graph G with n nodes, M can be 
the transition matrix of its natural walk'^, whose weights are m^y = l/d'^{x), 
where {x) is the outdegree of x (the number of arcs going out of x) . 

We recall that the spectral radius p{AI) of M coincides with the largest (in 



^If this is not the case, just multiply the matrix by the inverse of the maximum row sum. 
The multiplication does not affect the eigenspaces, but now the matrix satisfies the conditions 
above. Of course, the values of the damping factor (see further on) have to be adjusted 
accordingly. 

■^We make no assumptions on G, so some nodes might be dangling (i.e., without successors). 
In that case the corresponding rows of M will be zeroed, so M would be neither stochastic, 
nor a random walk in a strictly technical sense. 
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modulus) of its eigenvalues, and satisfies^ 

min IIMilli < p(M) < max||Mi||i = ||M||i = 1, 

i i 

where Mj is the i-th row of M (the second inequality is always true; the first 
one only for nonnegative matrices). 

Let f be a nonnegative vector satisfying ||f ||i = 1 (i.e., a distribution) and 
a G [0 . . 1). The spectral ranking"^ of M with preference vector v and damping 
factor a is defined by 

r = (1 - a)vil - aM)-^ = (1 - Oi)v ^ a^M''. 

fe>0 

Note that the r needs not be a distribution, unless M is stochastic.^ Note also 
that the linear operator is defined for a G [0 . . 1/ p{M)), but usually estimating 
p(M) is very difficult. The value l/p{M) can actually be attained by a limiting 
process which essentially makes the damping disappear [8] . 

We start from the following trivial observation: while it is very difficult to 
"guess" which is the spectral ranking r associated to a certain v, the inverse 
problem is trivial: given r, 

V = -^—r(l - aM). 
1 — a 

The resulting preference vector v might not be, of course, a distribution (other- 
wise we could obtain any spectral ranking using a suitable preference vector), 
but the equation is always true. 

The observation is trivial, but its consequences are not. For instance, con- 
sider an indicator vector Xx{z) = [x = z\. If wc want to obtain (1 — a)Xx as 
spectral ranking, the associated preference vector v has a particularly simple 
form: ^ 

v= Y^i'^-(^)Xx{^-aM) = Xx-a^rn^yXv (1) 

x^y 

We remark that in the case of a natural random walk, rUxy = d(x)~^ , which 
does not depend on y and can be taken out of the summation. Of course, since 
spectral rankings are linear we can obtain (1 — q)Xx multiplied by any constant 
just by multiplying v by the same costant. 

^Wc use row vectors, so the £i norm of a matrix is the maximum of the norm of the rows. 

''"Spectral ranking" is an umbrella name for techniques based on eigenvectors and linear 
maps to rank entities; see [8] for a detailed history of the subject, wrhich was studied already 
in the late forties. 

^If M is the natural walk on a graph, r is not exactly PageRank [7], but rather the 
pseudorank [5] associated with v and a. The pseudorank is not necessarily a distribution, 
whereas technically a PageRank vector always is. The distinction is however somehow blurred 
in the literature, where often pseudoranks are used in place of PageRank vectors. If G has no 
dangling nodes, the pseudorank is exactly PageRank. Otherwise, there are some differences 
depending on how dangling nodes are patched [4]. 
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2 The push algorithm 



If the preference vector v is highly concentrated (e.g., an indicator) and a is 
not too close to one most updates done by linear solvers or iterative methods to 
compute spectral rankings are useless — either they do not perform any update, 
or they update nodes whose final value will end up to be below the computational 
precision. 

The push algorithm uses the concentration of modifications to reduce the 
computational burden. The fundamental idea appeared first in Jeh and Widom's 
widely quoted paper [6]. albeit the notation somehow obscures the ideas. Berkhin 
restated the algorithm in a different and more readable form [2]. Andersen, 
Chung and Lang [f ] applied a specialised version of the algorithm on symmetric 
graphs. All these references apply the idea to PageRank, but the algorithm is 
actually an algorithm for the steady state of Markov chains with restart [3], 
and it works even with substochastic matrices, so it should be thought of as an 
algorithm for spectral ranking with damping.'' 

The basic idea is that of keeping track of vectors p (the current approxima- 
tion) and r (the residual) satisfying 

p + (f - a)r{l - aM)-^ = (1 - a)v{l - aM)''^ 

Initially, p = and r = v, which makes the statement trivial, but we will 
incrementally increase p (and reduce correspondingly r). 

To this purpose, we will be iteratively pushing^ some node x. A push on x 
adds (1 — a)rxXx to p. Since we must keep the invariant true, we now have to 
update r. If we think of r as a preference vector, we are just trying to solve the 
inverse problem (1): by linearity, if we subtract from r 




the value (1 — a)r{l — aM)~^ will decrease exactly by (1 — o)rxXxi preserving 
the invariant. 

It is not difficult to see why this choice is good: we zero an entry (the x-th) 
of r, and we add small positive quantities to a small (if the graph is sparse) set 
of entries (those associated with the successors of x) , increasing the £i norm of 
p by (1 — a)rx, and decreasing at least by the same amount that of r (larger 
decreases happening on strictly substochastic rows — e.g., dangling nodes). Note 
that since we do not create negative entries, it is always true that 

IIpIIi + Iklli < 1- 

Of course, we can easily keep track of the two norms at each update. 

^An implementation of the push algorithm for the computation of PageRank is available 
as part of the LAW software at http://law.dsi.unimi.it/. 
'^The name is taJien from [1] — we find it enlightening. 
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The error in the estimate is 



||(l-a)r(l-aM)-i||^ = 

(1 -a)||r^a'=M'= < (1 - a)||r||i ^a'^||M*=||^ < ||r||i. 

fe>0 ^ fe>0 

Thus, we can control exactly the absolute additive error of the algorithm by 
controlling the £i norm of the residual. 

It is important to notice that if M is strictly substochastic it might happen 
that 

- a)v{l - aM)-^\\^ < 1. 

If this happens, controlling the ii norm of the residual is actually of little help, 
as even in the case of natural walks the norm above can be as small as 1 — a. 

However, since we have the guarantee that p is a nonncgativc vector which 
approximates the spectral ranking from below, we can simply use 

\\r\\i > Iklli 



IIpIIi - - a)v{l - aM)-% 
as a measure of relative precision, as 

- a)v{l ~ aM)-^ - p\\^ ||(l-a)r(l-aM)-i||^ ^ ||r||i 



1(1 - a)v{l - aM)-i||^ 11(1 - a)v{l - aM)-^\^ " ||2?||i 



2.1 Handling pushes 

The order in which pushes are executed can be established in many different 

ways. Certainly, to guarantee relative error e we need only push nodes v such 
that rx > £||p||i/n, as if all nodes fail to satisfy the inequality then ||r||i/||p||i < 
s. 

The obvious approach is that of keeping an indirect priority queue (i.e., 
a queue in which the priority of every element can be updated at any time) 
containing the nodes satisfying the criterion above (initially, just the support of 
v) and returning them in order of decreasing r^- Nodes are added to the queue 
when their residual is larger than £||p||i/n. Every time a push is performed, the 
residual of successors of the pushed node are updated and the queue is notified 
of the changes. 

While this generates potentially an O(logn) cost per arc visited (to adjust 
the queue), in intended applications the queue is always very small, and pushing 
larger values leads to a faster decrease of ||r||i. 

An alternative approach is to use a FIFO queue (with the proviso that nodes 
already in the queue are not enqueued again). In this case, pushes are not 
necessarily executed in the best possible order, but the queue has constant-time 
access. 

Some preliminary experiments show that the two approaches are comple- 
mentary, in the sense that in situations where the number of nodes in the queue 
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is relatively small, a priority code reduces significantly the number of pushes, 
resulting in a faster computation. However, if the queue becomes large (e.g., 
because the damping factor is close to one), the logarithmic burden at each mod- 
ification becomes tangible, and using a FIFO queue yields a faster computation 
in spite of the higher number of pushes. 

In any case, to reduce the memory footprint for large graphs it is essential 
to keep track of the bijection between the set of visited nodes and an identifier 
assigned incrementally in discovery order. In this way, all vectors involved in the 
computation can be indexed by discovery order, making their size dependent 
just on the size of the visited neighbourhood, and not on the size of the graph. 

2.2 Convergence 

There arc no published results of convergence for the push algorithm. Ander- 
sen, Chung and Lang provide a bound not for convergence to the pseudorank, 
but rather for convergence to the ratio between the pseudorank and the sta- 
tionary state of M (which in their case — symmetric graphs is trivial, as it is 
proportional to the degree) . 

In case a priority queue is used to select the nodes to be pushed, when the 
preference vector is an indicator the amount of rank going to p at the first 
step is exactly 1 — a. In the follow d(x) steps, we will visit either the successors 
of X, whose residual is a/d{x), or some node with a larger residual, due to the 
prioritization in the queue. As a result, the amount of rank going to p will be 
at least a{l ~ a). In general, if Px{t) is the path function of x (i.e., Px{t) is the 
number of paths of length at most t starting from x), after Px{t) pushes the £i 
norm of r will be at most 1 — (1 — a) J2o<k<t ^'^ ~ Q^'^^- 

2.3 Some remarks 

Precomputing spectral rankings. Another interesting remark^ is that if 
during the computation we have to perform a push on a node x and we happen 
to know the spectral ranking ofx (i.e., the spectral ranking with preference vector 
Xx) we can simply zero rx and add the spectral ranking of x multiplied by the 
current value of Vx to p. Actually, we could even never push x and just add the 
spectral ranking of x multiplied by rx to p at the end of the computation. 

Lc^t us try to make this observation more general. Consider a set H of 
vertices whose spectral ranking is known; in other words, for each x G H the 
vector 

Sx = {I - a)xx{l - aM)-' 
is somehow available. At every step of the algorithm, the invariant equation 

p -I- (1 - a)r(l - aM)-i = (1 - a)v{l - aM)-^ 

^Actually, a translation of Jeh and Widom's approach based on partial vectors, which was 
restated by Berkhin's under the name hub decompositions [2] . Both become immediate in our 
setting. 
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can be rewritten as follows: let r' be the vector obtained from r after zeroing 
all entries outside of H, and let p' = p + J2xeH '''x^x- Then clearly 

p' + il- a)r'(l - aM)-i = (1 - a)v{l - aMyK 

Note that 
and 

IIp'IIi = IIpIIi + ^'>'x- \\sx\\i- 

So we can actually execute the push algorithm keeping track of p and r but 
considering (virtually) to possess p' and r' , instead; to this aim, we proceed as 
follows: 

• we never add nodes in H to the queue; 

• for convergence, we consider the norms of p' and r', as computed above; 

• at termination, we adjust p obtaining p' explicitly. 

Berkhin [2] notes that when computing the spectral ranking of x we can use x 

as a hub after the first push. That is, after the first push we will never enqueue x 
again. At the end of the computation, we simply multiply the resulting spectral 

ranking by l + rx + r^-\ = 1/(1 — fx)- In this case, must be divided by 

1 — rx to have a better estimate of the actual norm. Preliminary experiments 
on web and social graphs show that the reduction of the number of pushes is 
very marginal, though. 

Patching dangling nodes. Suppose that, analogously to what is usually 
done in power-method computations, we may patch dangling nodes. More pre- 
cisely, suppose that we start from a matrix M that has some zero rows (e.g., 
the natural walk of a graph G with dangling nodes), and then we obtain a new 
matrix P (for "patched") by suhstituting each zero row with some distribution 
u, as yet unspecified. 

It is known that avoiding at all the patch is equivalent to using u = v [5], 
modulo a scale factor that is computable starting from the spectral ranking 
itself. More generally, if u coincides with the distribution that is being used for 
preference, no patching is needed provided that the final result is normalized. 

For the general case (where u may not coincide with v), we can adapt the 
push method described above as follows: we keep track of vectors p and r and 
of a scalar representing the amount of rank that went through dangling nodes. 
The equation now is 

P+{1- a){r + 9u){l - aP)-^ = (1 - a)xx{l - aP)-^ 

When p is increased by (1 — a)rxXx, we have to modify r and 6 as follows: 
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• if a; is not dangling, we subtract from r the vector 




• if a; is dangling, we subtract just 

and increase 9 by arx- 

At every computation step the approximation of the spectral ranking will be 
given by p' = p + Os, where s is the spectral ranking of P with preference 
vector u and damping factor a.^ As on the case of hubs, we should consider 
IIp'IIi = + when establishing convergence. 
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^Of course, a must be precomputed using any standard method. If M is the natural walk 
of a graph G, this is exactly the PageRank vector for G with preference vector u. 
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